Journals
  Publication Years
  Keywords
Search within results Open Search
Please wait a minute...
For Selected: Toggle Thumbnails
Audio visual joint action recognition based on key frame selection network
Tingxiu CHEN, Jianqin YIN
Journal of Computer Applications    2022, 42 (3): 731-735.   DOI: 10.11772/j.issn.1001-9081.2021060995
Abstract207)   HTML9)    PDF (771KB)(71)       Save

In recent years, the action recognition of audio visual joint learning has received some attention. Whether in video (visual modality) or audio (auditory modality), the occurrence of action is instantaneous, only the information in the time period of action can significantly express the action category. How to make better use of the significant expression information carried by the key frames of audio-visual modality is one of the problems to be solved in audio-visual action recognition. According to this problem, a key frame screening network KFIA-S was proposed. Though the linear temporal attention mechanism based on the full connected layer, different weights were given to the audio-visual information at different times, so as to screen the audio-visual features beneficial to video classification, reduce redundant information, suppress background interference information, and improve the accuracy of action recognition. The effect of different intensity of time attention on action recognition was studied. The experiments on ActivityNet dataset show that KFIA-S network achieves the SOTA (State-Of-The-Art) recognition accuracy, which proves the effectiveness of the proposed method.

Table and Figures | Reference | Related Articles | Metrics